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Analytical Derivation of Software Failure Regions 1 



Timothy J. Shimeall 
MAJ John Manning Bolchoz, US Army 
CDR Rachel Griffin, US Navy 
Computer Science Department 
Naval Postgraduate School 
Monterey, CA 93943-5100 

Abstract 

This paper proposes an analytical method for deriving software failure regions, which 
are regions of the input space that are mapped failures by specific faults. Previous studies 
have used empirical rather than analytical approaches to derive failure regions. A manual 
technique is presented and proven to produce the necessary and sufficient conditions of a 
fault being executed and leading to a failure. Semiautomated tools to assist in the manual 
technique are discussed, as is the use of failure regions in regression testing. 

Index terms: Software Testing, Formal Models, Software Faults, Software Failures, Soft- 
ware Tools 



1 Introduction 

It is widely recognized that many faults in installed software are introduced during software 
maintenance. For example, a recent change to the switching software used in the US 
telephone system inadvertently introduced a three-line fault that interrupted service to 
customers in six states. [17] Regression testing is the process of testing software to determine 
if a modification has introduced a fault. Currently, there is no consistent theoretical 
basis for the conduct of regression testing, forcing maintainers to rerun all previous tests 
(in the phone system example, a 13-week process) to determine if any previously-correct 
results have been affected during maintenance. One reason for modifying software during 
maintenance is to correct known faults in the software. This paper describes and verifies 
a technique for predicting the domain of input that produces results affected by a specific 
fault in a specific piece of software. This technique is supported by a set of software tools, 

lr The research reported here was conducted with funds administered by the Naval Postgraduate School 
Research Council. 
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also described in this paper. Knowledge of this input domain focuses the testing needed 
to determine if correction of the fault corresponding to the domain has introduced further 
faults. For convenience of presentation, this paper assumes that the software to be analyzed 
is written in an ALGOL-derived language. 

In this paper, a fault is an erroneous piece of program source code. An error is a 
discrepancy between a computed, observed or measured value or condition and the true, 
specified or theoretically correct value or condition, the immediate result of the execution 
of a fault. A failure is the termination of the ability of a functional unit to perform its 
required function (i.e., a failure is either the production of a value that contradicts some 
portion of the specification, or the lack of production of a specified value). [1] 

A software failure region is the set of all input values that are mapped by an individual 
program fault onto any failure. Each fault in a program, by definition, is associated with a 
single failure region, which may be composed of several disjoint or overlapping subregions. 
A single failure region may be associated with more than one fault, these faults being 
differentiated by varying effects on the program results. Different faults may also map 
distinct failure regions onto the same set of program failures. See figure 1 for a diagram of 
these failure region - fault - set of failure associations. 

This paper describes a manual technique for analytically deriving a failure region from 
a known fault in a program’s source code. This technique views a failure region as bounded 
by the conjunction of three boolean conditions. The first, reachability, states the conditions 
that hold when the program executes the section of source code containing the fault. The 
second, generation , states the conditions that cause the fault to produce an erroneous 
internal state (e.g., bad variable values). The third, propagation, states the conditions 
under which the erroneous internal state becomes a failure. 
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Program 




Figure 1: Associations between Failure Regions, Faults and Failures 



If software developers can efficiently determine a software failure region, regression 
testing would be simplified. The bounding conditions of a failure region are the necessary 
and sufficient conditions for a failure due to the associated fault. The regression-testing 
subtask of evaluating if the fault is fully corrected may be viewed as applying a sample 
from the failure region to the program and examining the results for failures. Instead of 
simply repeating previous tests, testers could apply intelligent sampling techniques on the 
failure region. The regression-testing subtask of evaluating if the fault correction has not 
introduced new faults may also be facilitated by knowledge of the failure region. This 
latter question is the subject of ongoing research[6]. 

The following section briefly reviews research that has involved failure regions or con- 
cepts similar to failure regions. Section three describes a manual method for analytical 
derivation of the failure regions. Section four describes automated tools that have been de- 
veloped to aid this analysis. This paper concludes with a discussion of specific applications 
of failure regions to regression testing. 

2 Failure Regions 

Software failure regions are sets of program inputs, one set for each fault in the program 
and each element in each set a complete series of inputs that causes the program to execute 
the corresponding fault and produce a failure due to the fault. These sets are always finite, 
given the limitations on representation in a finite machine, but are often too large for 
tractable enumeration. Each element of each set may be of complex structure. Due to 
these considerations, this paper will denote failure regions by the boolean conditions that 
identify their elements. 

Ammann and Knight first described software failure regions in research on software fault 
tolerance[3]. They identified and characterized failure regions by repeated sampling across 
a plane through the program input space. They used plots of the identified portions of the 
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failure regions as a basis for the proposal of a data-variation fault-tolerance technique. No 
attempt was reported to identify the bounding conditions of the failure regions. Dunham 
and Finelli applied this sampling procedure to other software[4], referring to failure regions 
to as ‘error crystals’. 

In parallel with this fault-tolerance-based interest in the effects of known faults, testing- 
based research has investigated conditions that would lead to detection of suspected or 
hypothesized faultsfll, 18, 5, 12]. The testing techniques that have been derived from that 
research seek to identify at least one data set that would reveal a fault, rather than in 
deriving the necessary and sufficient conditions to identify all data that would reveal the 
fault. These two problems are clearly related, but they are not identical, and it is the latter 
problem that is the subject of this paper. 

Voas[16] investigated the dual of the problem discussed in this paper, namely iden- 
tification of the portions of the program text that are most likely to contain faults by 
examination of where faults are likely to be propagated or obscured. This probabilistic 
work has focussed principally on missing-statement faults and the insights from that work 
are currently under examination using software failure regions[6]. 

While there has been broad interest in the conditions that lead to fault detection, no 
one has previously proposed a technique to derive the necessary and sufficient conditions 
for a fault to cause a failure. Such a technique is the focus of the next section. 

3 Manual Technique for Deriving Failure Regions 

3.1 Assumptions 

There are several assumptions that are implicit in the failure regions analysis technique. 
These assumptions may limit the applicability of the technique and are provided here to 
give context to the technique description that follows. 

The first assumption is that the region is to correspond to a known fault. The region 
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analysis technique does not detect faults, but generates failure regions for previously- 
detected faults. Myers[10] conjectured that the sorts of information that appear in a 
failure region’s bounding conditions may be used to detect other faults, but this has yet to 
be empirically determined. The next section describes the characteristics of a fault used 
in application of the region analysis technique . 

The second assumption is that, while deriving the failure region corresponding to a 
given fault, the effects of other faults may be ignored. In the early stages of program 
testing, this assumption may be quite restrictive, and research is anticipated to derive 
failure regions for multiple faults simultaneously. 

The third assumption is that the analyst knows the loop effects for each loop in the 
program, whether iterative or recursive. Specifically, the technique assumes the analyst 
can indicate what changes in the local data state of a program execution are made by 
a loop. This may be easier to define then a loop invariant. While identification of loop 
effects is impossible in the general case, since not all loops terminate or can be proved 
to terminate, in many cases the source code, design or requirements documentation will 
readily provide this information. 

The fourth assumption is that the program under consideration is a sequential program 
with soft -real- time constraints. The program is assumed sequential to avoid consideration 
of interleaving and interference effects. It is assumed under soft-real-time constraints to 
allow analysis for excessive looping faults, but to avoid consideration of explicit timing 
values during the analysis. Research is underway to extend the analysis technique to deal 
with concurrent and hard-real-time software. 

The fifth assumption is that the virtual machine provided by the language processor, 
hardware and operating system will not differ from its specified operation in a manner 
that affects the analysis. In practice, this assumption means that either the program un- 
der analysis avoids any problems with the virtual machine, or that the developers have 
corrected these problems prior to application of the region analysis technique on the appli- 
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cation software. It is not assumed that the virtual machine is an ideal machine, nor that 
it has limitless precision and storage, so that the technique may analyze faults that lead 
to problems revealed by virtual machine limits. 

The last assumption is that undefined variables will not be coincidentally correct. We 
introduce this assumption to avoid system-specific probabilistic arguments as to the con- 
tents of memory. In practice, this assumption introduces a degree of overestimation into the 
failure region analysis, but in this first formulation of the failure region analysis technique, 
we view that overestimation as tolerable. 



3.2 The Fault and Program Models 

In this paper, a fault is an erroneous, possibly non-contiguous, portion of the program 
source code. A fault may be undesirable extra statements, a miswritten set of statements, 
or a lack of needed statements. A fault is characterized for the region-analysis technique 
by three pieces of information. The first is its location, which is either the statement most 
closely enclosing the fault or the first statement encountered in the course of execution that 
uses values affected by the fault. Of these two alternatives, the first is most tractable in 
analysis, but can’t be defined for some missing-code faults. Should there be no statement 
that uses values affected by the erroneous code portion, then if that section causes the lack 
of an output, the analysis treats it as a self-revealing fault (see section 3.3); otherwise if 
that section does not cause the lack of an output, then it is not a fault to be analyzed by 
this technique. 

The second piece of information used to characterize a fault is the list of variables 
that the immediate execution of the fault gives erroneous values to, without any interven- 
ing action by any other section of the source code. These erroneous values (henceforth 
called errors ), are viewed as being stored in atomic variables, slices of arrays or files with 
known limits, individual fields of a record structure, indirect-referenced data with all aliases 
known, and entire sets, strings or variant structures. Should a fault not produce an error, 
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but otherwise cause the program to fail, the analysis treats it as a self-revealing fault. 

The third piece of information used to characterize a fault is the conditions under which 
it will generate an error. These conditions are represented as a boolean expression on the 
local execution state of the module containing the fault’s location. The analysis should 
include the known limits on slices of arrays or files in this expression. 

More formally, then, the analysis technique uses a characterization of a fault as a 3- 
tuple: 

< Lp, Vp, Cp > 

where Lp is the location of the fault, Vp is the list of variables that form the error caused 
by the fault and Cp is the the conditions under which the fault causes the error, described 
by a Boolean expression. In practice, this characterization means that the fault must be 
identified specifically prior to the analysis, but not necessarily corrected. This character- 
ization is not unique, as equivalent Boolean expressions to any Cp may be constructed 
by negation (i.e., ->(->Cp) = Cp ). Such an equivalence will not change the results of the 
analysis technique. 

A program is characterized by the variables, modules and statements that are its con- 
stituent parts. Modules and variables are identified by unique names (if necessary, by 
associating them with their defining scope). Statements are identified by their order of 
occurrence within the program source code, and may enclose subsidiary statements. The 
input and output statements of the program are explicitly identified. Formally, then, a 
program is characterized as a 6-tuple: 

< Vp, Rp , A/p, 5p, /p, Op > 

where Vp is the list of variables defined by the program, Rp is a subset of Vp indicating 
variables that are input, A/p the list of modules, each having the same structure as the 
program, Sp is the list of statements, Ip the subset of Sp that are input statements, Op 
the subset of Sp that are output statements. An input statement is defined as one where 
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variables receive values from a source external to the program. An output statement is 
defined as one where messages and variable values are transmitted to a sink external to 
the program. The initial statement of a program is not considered an input statement, 
nor is the final statement of the program considered an output statement, unless these 
statements are members of Ip or Op, respectively. 

A statement 5 is formally characterized as a 7-tuple: 

< T s , D s , Rs, Us, C s , E s , E' s > 

Ts is the text of the statement, exclusive of any enclosed statements. D$ is a list of the 
variables that are defined (given values) by the statement. Rs is a list of the variables that 
are referenced by the statement. Us is a list of the variables that become undefined (lose 
their value) at statement S. Cs is the portion of Ts that forms a boolean condition. Es and 
E's are lists of statements enclosed by the statement and executed if Cs evaluates to true 
or false, respectively. Any or all of Ds, Rs > E$ and E' s may be the empty list. Should the 
statement not contain a boolean condition, Cs will be ‘true’. A path is a set of statements, 
a subset of the union of all statements in the program and its submodules. The elements 
of a path are all statements visited during some specific portion of a program execution, 
and some elements of a path may also appear as substatements of other elements of the 
path. 

Lastly, the type BooleanExpression, used in the sections following, is an expression 
in the first-order predicate calculus, maintained as an expression rather than being reduced 
to a Boolean value. 

3.3 Overview of the Analysis Technique 

This paper describes the failure regions analysis technique using a pseudocode similar to 
Ada. This description is for compactness and formality, rather than for implication of 
automatic implementation of the analysis technique. 
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In a broad view, the analysis technique is made up of four steps: 1) deriving reachability 
conditions, 2) merging error-generation conditions obtained during fault identification, 3) 
deriving propagation conditions and 4) post-processing to simplify the region conditions 
and eliminate non-input variables. A process-program arrangement of these steps appears 
in Figure 2. A self-revealing fault, one that always causes an immediately visible result 
that contradicts the specification (e.g., an abnormal program exit) or that always prevents 
a specified result, omits the analysis of error propagation conditions (i.e., its propagation 
condition is ‘true’). 

procedure DeriveRegion (in F:Fault; in PtProgram; 

out region : BooleanExpression) is 
r: BooleanExpression := 'true' 
q: BooleanExpression := 'true' 
for each i 6 Ip loop 

for each path t from i to Lp loop 
r := r V Reach (t,Lp) 
end loop 
end loop 

r := r A Cp -- obtained from debugging 
if F is not a self -revealing fault then 
for each o G Op loop 

for each path t from Lp to o loop 
q := q V Propagate (F, t, o) 
end loop 
end loop 
end if 

r := Reduce (r A q, V P — R P ) 
region := Simplify (r) 
end DeriveRegion 

Figure 2: Top-Level Description of Region Analysis 



The technique requires the user to select paths for the derivation of the reachability and 
propagation conditions. The assumptions of soft-real-time program character and isolation 
of faults combine to ensure that the analysis will terminate (i.e., that it need not consider 
infinite or excessive looping prior to the fault location). 
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3.4 Analysis for Fault Reachability Conditions 

The first analysis step derives the fault’s reachability conditions along a specific path. 
The analysis invokes this step iteratively, as described in figure 2. This step forms a 
composition of all the conditions connecting one statement in the path to another. One 
group of conditions occur in the text of loop, if, and multiple-branch conditional (case) 
statements. This group will henceforth be called referencing clauses. 

Another group of conditions, henceforth called defining clauses, occur due to the effect of 
assignment statements and procedure or function calls. These clauses represent the changes 
in variable values that occur during program execution. The analysis uses defining clauses 
to record transformation of the program state, so that the analyst may change conditions 
that occur in the program (involving a mix of input values, constants and current values of 
local variables) to conditions that involve only input values and constants. It is the latter 
conditions that form the desired bounds on the failure region. Most defining clauses will 
not appear in the final failure region bounds, but only in the intermediate forms leading 
to those final bounds. 

The reachability condition analysis, then, traverses the statements in the path, form- 
ing a composition of the defining and referencing clauses. This traversal is described in 
figure 3. The formation of the composition is guided by the known location of the fault, 
and simplified to deal with that specific set of code. Loops in the path are dealt with 
by applying the exit condition (if the loop body is not in the path) or applying the loop 
effects (which will be represented by Effect (s,c) below, where s is a loop statement 
operating in a state characterized by c) and then extracting the loop body from the path. 
If the fault location is part of the loop body, then the traversal algorithm applies the loop 
effects to emulate any iterations prior to the fault and extracts any paths through the 
loop body except from the loop initiation to the fault location. Other control statements 
axe dealt with by incorporating their enclosed conditions in the reachability condition and 
traversing their enclosed statements (if any). Non-control statements are dealt with using 
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the Semantics operation. 



function Reach (in f:path; in g: statement) 
returns BooleanExpression is 
cur: BooleanExpression := 'true' 
flat: set of statements := 0 
flatn : set of statements := 0 
for each statement 3 € t loop 

flat := flatten (E,) 

flatn := flatten (E' s ) 

if s is a loop A E, C t A g € flat then 
cur := cur A Effect (s, cur) A C, 
t := (i-flat) U {s' | 3' € flat A g 6 flatten (3')} 

elseif 3 is a loop A g 6 flat then 
cur := cur A C, 

elseif s is a loop A E, C t then 
cur := cur A Ef feet (s, cur ) 
t : = t — E, 

elseif s is a loop A E a (£ t then 
cur := cur A ->C 3 

elseif E s ^ 0 A (E, C t V g € flat ) then 
cur := cur A C, 

elseif E' 3 0 A ( E\ C t V g 6 flatn) then 

cur := cur A ~^C 3 

else cur := Reduce (cur, D 3 ) A Semantics {T at cur) 

end if 
end loop 

return Simplify (cur) 
end Reach 



Figure 3: Description of Reachability Analysis 



The derivation of the reachability and propagation conditions make use of an operation 
referred to as Semantics and an inclusion-expanding function flatten. Semantics 
represents the operation of the underlying virtual machine. The analyst is assumed to 
take the text of a statement and a context for evaluation to obtain a Boolean expression 
describing the results of evaluation. Methods for implementing Semantics in practice 
include symbolic execution^ and code reading[9]. 

The function flatten takes a list of statements and returns the union of that list with 
all of the statements enclosed by any element of that list. The algorithm for flatten is 
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given in figure 4. 

function flatten (in L : set of statements) 
returns set of statements is 
cur: set of statements := 0 
working: set of statements := L 
while working ^ 0 loop 
s 6 working 

working := (working— { 5 }) U E, U E' 3 
cur := curUfijUi 1 ' U {sj 

end loop 
return cur 

end flatten 



Figure 4: Flatten Algorithm 



3.5 Analysis for Error Propagation Conditions 

After the analysis of reachability conditions and the merger of the error generation con- 
ditions that were obtained during fault identification, the next step in the failure region 
derivation is to determine when (if ever) the value of each output is affected by the fault. 
Figure 5 provides an overview of the error-propagation analysis. 

The starting point for this analysis is a list of variables with erroneous values that are 
immediate results of the fault (denoted by Vp in figure 5). This list forms the initial ‘con- 
tamination list’ for the analysis. The analysis propagates the contamination list separately 
along each possible path from the location of the fault to each program output. In each 
specific path, the analysis expands the contamination list as variables receive values de- 
rived from one or more variables in the contamination list, the analysis removes variables 
from the contamination list as they become undefined or as program statements give them 
new values that do not derive from variables in the contamination list. If the last output 
statement in the path references variables in the contamination list, then the analysis re- 
turns the path condition as a partial failure region condition. If the output statement does 
not reference any of the variables in the contamination list, or if the contamination list 
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becomes empty, then the path is not incorporated in the failure region conditions. 

3.6 Post-Processing of Analysis 

To be a practical source of information for testing, the bounding conditions of a software 
failure region must be simplified as much as feasible without loss of information and the 
conditions must involve only variables that are of interest to the testers, typically the pro- 
gram input variables. There are two operations that are used to convert the conjunction 
of conditions derived from the program text into a useful failure region bound. The Sim- 
plify operation is a conventional Boolean simplification. The Reduce operation takes a 
Boolean condition (involving both defining and referencing clauses) and a list of variables. 
It returns a Boolean condition in which each reference to any of the variables on the list 
has been replaced with the associated definition, and any definition of the variables on the 
list has been eliminated. 

The preceding description of the Simplify and Reduce operations does not imply 
that these operations are to be fully automated. In practice, the analyst may need to 
intervene to replace complex or intertwined clauses with simpler forms derived from the 
program specification, design or source code. This has been found to be particularly true 
when the input data control the values of program variables only in an indirect fashion. 

3.7 Example of Deriving A Software Failure Region 

As an example of deriving failure regions from the source code, consider the Pascal function 
AcuteAngle in figure 6. This function is specified to determine if the angle subtended 
by a rectangle centered at an arbitrary point ( Xt,Yt ) when viewed from the origin (i.e., 
the largest angle with the vertex at the origin and two of the corners of the rectangle as 
the endpoints) is less than | radians. The width (w) and length (/) of the rectangle are 
parallel to the x and y axes, respectively. (See figure 7 for an illustration of a rectangle 
and origin layout, where a is the angle to be measured.) This function is derived from 
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function Propagate (in F: Fault; in (:path; 

in o: statement) returns BooleanExpression is 
cur: BooleanExpression := 'true' 
contamlist: list of variables := Vp 
flat: set of statements := 0 
flatn: set of statements := 0 
for each statement s 6 t loop 
if contamlist is null then 
cur := false 
exit loop 
end if 

flat := flatten (E a ) 
flatn := flatten (£') 

if s is a loop AE a C t A Lp 6 flat then 
cur := curAEf feet (s , cur) A C, 
t := (t— flat) U {s' | s' € flat ALp 6 flatten (5')} 
elseif ^ is a loop ALp Eflat then 
cur := curACj 

elseif 5 is a loop AE a C t then 
cur := curA Effect {s, cur) 
t : = t — E, 

elseif 5 is a loop A E a <£. t then 
cur := curA-iC, 

elseif E s / 0 A ( E 3 C t V Lp € flat) then 
cur := curAC a 

elseif E' a 0 A {E' a C t V Lp E flat ) then 
cur := cur A ->C a 

else 

cur := Reduce (cur, D a ) A Semantics (T a , cur) 

end if 

if contamlistni? a ^ 0 then 

contamlist := contamlistU/ 5 , 
elseif contamlistnD 3 7^ 0 then 
contamlist := contamlist - D a 

end if 

contamlist := contamlist - U a 

end loop 

if contamlistfli? 0 = 0 then return false 
else return Simplify (cur) 

end if 

end Propagate 

Figure 5: Description of Propagation Analysis 
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code developed for a software experiment[15]. The particular implementation in figure 6 
has a fault at line 13, where the corners used to calculate the subtended angle are those 
identified as B and C in figure 8, rather than the proper B and D. Assume that the only 
input statement is at line 1 and the only output statement is at line 29. The conditions 
for reaching line 13 are: 

X* = XT - f A Y a = YT + 

X B = XT+f AY b = YT+$A 
X c = XT - j A Kc = YT — 

X D = XT + | A Y d = YT - 
Yd < 0 A Y b > 0 A Xp <0 

The first four lines are defining clauses, the last line a referencing clause. 

The conditions under which the code produces an erroneous value due to the substitu- 
tion of C for D are those where the angle formed by B, C, and the origin (angle COB) is 
different from B, D and the origin (angle DOB), or: 

Yc_ Yd_ 

X c X D 

which, since Yc = Yp, simplifies to Xp > Xc, which reduces to w > 0. Lastly, the 
conditions under which the erroneous value isn’t masked by the comparison at line 29 in 
figure 6 are where angle COB is less than | and angle DOB is greater than or: 

( tan_1 ^^ ~ tan-1 (|:)) - \ < ( tan_1 ^^ ” tan_1 (||)) 

Combining these inequalities, substituting for defining clauses and simplifying yields the 
result: 

(^ < YT < i) A X T > A w > 0A 

( tan_1 (^?rf ) - tan_1 (x?zf-)) < f < ( tan_1 (Y?rf ) - tan_1 ( 

A cross-section of this region appears as the cross-hatched portion of figure 8 for a particular 
set of values for Xx, Yj, w and l. As we vary those four values, the failure region forms a 
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tetrahedral cylinder through the four-dimensional input space, with a rotating tetrahedral 



groove through one face. 



function AcuteAngle (XT, YT, W, L : real) :boolean; 
const Pi=3. 1415926535; 
var AX, AY, BX, BY, CX,CY, DX, DY : real; 

Angle, HalfWidth, Half Length: real; 

begin 

1 HalfWidth := W / 2; Half Length := L / 2; 

2 AX := XT-HalfWidth; AY : =YT+Half Length; 

3 DX := XT+HalfWidth; DY : =YT-Half Length; 

4 BX := DX; BY : =AY; 

5 CX : = AX; CY:=DY; 

6 if ( (YO > BY) and (XO > BX) ) 

7 Angle := abs (arctan (DY/DX) 

8 else if ( (XO < CX) and (YO < 

9 Angle := abs (arctan (AY/AX) 

10 else if ( (YO > AY) and (XO < 

11 Angle := abs (arctan (CY/CX) 

12 else if ( (YO > DY) and (XO > 

13 Angle := abs (arctan (BY/BX) 

14 else if XO > BX then 

15 Angle := abs (arctan (DY/DX) 

16 else if XO < AX then 

17 Angle := abs (arctan (AY/AX) 

18 else if (YO >= AY) then 

19 if (XO = AX) then Angle := Pi/2 - abs (arctan (BY/BX) ) 

20 else if (XO < BX) then 

21 Angle := Pi - abs (arctan (AY/AX) ) - abs (arctan (BY/BX) ) 

22 else Angle := Pi/2 - abs (arctan (AY/AX) ) 

23 else if (YO <= CY) then 

24 if (XO = DX) then Angle := Pi/2 - abs (arctan (CY/CX) ) 

25 else if (XO > CX) then 

26 Angle := Pi - abs (arctan (DY/DX) ) - abs (arctan (CY/CX) ) 

27 else Angle := Pi/2 - abs (arctan (DY/DX) ) 

28 else Angle := 2 * Pi; 

29 AcuteAngle := Angle <= (Pi/2); 

end; 



then 

- arctan (AY/AX) ) 
CY)) then 

- arctan (DY/DX) ) 
AX) ) then 

- arctan (BY/BX) ) 
DX) ) then 

- arctan (CY/CX) ) 

+ arctan (BY/BX) ) 
+ arctan (CY/CX) ) 



Figure 6: Function AcuteAngle 
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(XT,YT) 




•* w 



Figure 7: Parameter Layout for Function AcuteAngle 




Figure 8: Failure Region in Function AcuteAngle 



3.8 Validation of the Analysis Technique 

Section 1 defined a failure region as bounded by the necessary and sufficient conditions for 
a fault being executed and leading to a failure. This section proves the following theorem: 

Theorem 1 Given the assumptions in section 3.1 and that the fault detection process 
accurately supplies Lp, Vp and Cf, and that the Semantics function accurately supplies 
the effect of its argument statement on its argument environment, DeriveRegion, if 
it terminates, will produce necessary and sufficient conditions for executing a fault and 
causing it to lead to any failure. 

This theorem is the conjunction of the following two propositions: 

Proposition 1 (Necessity) All input sequences that satisfy the conditions produced by 
DeriveRegion will cause the fault to be executed and lead to some failure. 

Proposition 2 (Sufficiency) There are no input sequences that fail to satisfy the con- 
ditions produced by DeriveRegion and cause the fault to be executed and lead to any 
failure. 

Both of these propositions may be conveniently proved by contradiction. The negation 
of Proposition 1 assumes that some input satisfies the conditions generated by DeriveRe- 
gion but does not execute the fault and lead to a failure. This may only happen due to 
one or more of the following: 

nl The input does not cause the code at the fault location to be executed. 

n2 The input does not cause the fault to generate an error. 

n3 The input does not execute a path from the fault to any output statement. 

n4 The input maps the error generated by the fault into a correct state along the 
path to each output statement. 
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Option nl may occur iff at some statement prior to Lp, the condition on the branch 
that leads to Lp was false in the state derived from the input. That implies that there 
exists some input statement i € Ip such that a path from i to Lp exists, but that path is 
not included in the derivation of failure region bounds, or the conditions along the path 
were incorrectly analyzed by Reach. The implication that the path was not included is in 
conflict with the ‘all paths from i to Lp' loop in DeriveRegion. 

The implication that the conditions along a path were incorrectly analyzed by Reach 
implies two possibilities: the referencing clauses were incorrectly analyzed or the defining 
clauses were incorrectly analyzed. The possibility that the defining clauses were incorrectly 
analyzed is a contradiction of the assumption that Semantics accurately captures the 
effects of the statement. The possibility that the referencing clauses were incorrectly an- 
alyzed is shown a contradiction by examination of the condition handling in Reach (the 
if structure inside of the for each loop). 

In any path through the text of a program, there are three possible traversals of a 
single-entry, single-exit section of the text : the section could be completely traversed, 
the path could terminate inside of the section, or the path could include none of the 
statements in the section. Complete traversal of a code section is dealt with by the third, 
fifth and sixth if clauses in Reach, in which the conditions for entry of the code section 
are included in the path conditions. Partial traversal of a code section is dealt with by 
the first, second, fifth and sixth if clauses, where the presence of the path termination is 
tested, and if present, the conditions for entry of the code section are included in the path 
conditions. Non-traversal of a code section is dealt with by the for each loop, which 
restricts construction of path conditions to the statements in the path, and the fourth 
if clause, which incorporates the loop exit conditions if the body is not executed. Thus 
in each case the appropriate conditions are included in the path condition, so incorrect 
analysis by Reach is shown to be a contradiction. 
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Option n2 is a contradiction of the assumption that Cf has been accurately captured 
by the fault detection process, and hence may be neglected. 

Option n3 may occur iff at all statements reachable from Lp, the branches that include 
any member of the set of program output statements required conditions that were found 
to be false in the state derived from the input. That implies that there exist paths from 
Lp to at least one output statement o, but these paths are not included in the derivation 
of the failure region bounds or the conditions along the path were incorrectly analyzed by 
Propagate. The implication that the path was not included is in conflict with the ‘all 
paths from Lp to o' loop in DeriveRegion. The implication that the conditions along 
the path were incorrectly analyzed by Propagate may be shown to be a contradiction 
through argument paralleling the preceding discussion of Reach. 

Option n4 may occur iff at some point on the path selected by the input after the 
execution of Lp, all variables affected by the fault and subsequent erroneous calculation 
have had their values replaced by values resulting from calculation or reference with solely 
unaffected variables. This conflicts with the first if inside the for each loop in Prop- 
agate, which, by returning ‘false’, precludes input that selects such paths from inclusion 
in the failure region. 

Since each possible cause of the negation of Proposition 1 has been shown to be a 
contradiction, that proposition is found to be true. DeriveRegion is thus proven to 
generate necessary conditions for a fault to be executed and lead to a failure. 

The negation of Proposition 2 assumes that some input exists that does not satisfy the 
conditions generated by DeriveRegion but does lead the fault to produce some failure. 
This may only happen through one or more of the following: 

si The input causes program execution along a path from an input statement 
through the fault location that was not considered in DeriveRegion. 
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s2 The conditions derived from the input cause the fault to generate an error in a 
manner not considered in Cp. 

s3 The input causes program execution along a path from the fault location to an 
output statement that was not considered in DeriveRegion. 

Option si is shown to be a contradiction through an argument that parallels the refuta- 
tion of option nl. Option s2 is a direct contradiction of the assumption that Cf has been 
accurately captured by the fault detection process. Option s3 is shown to be a contra- 
diction through an argument that parallels the refutation of option n3. All three options 
being shown as contradictions, DeriveRegion is proven to generate sufficient conditions 
for a fault to be executed and lead to a failure. 

This proof of necessary and sufficient conditions is subject to the assumptions made 
at the start of the section. These assumptions limit the applicability of the manual tech- 
nique to those contexts that can reasonably satisfy them. If the technique is applied in a 
context where these assumptions do not hold, it may generate the necessary and sufficient 
conditions, but this proof does not hold. 

4 Automated Support for Failure Region Analysis 

While the manual process allows for the derivation of software failure regions, it is too 
mentally taxing and too slow for use with large or complex software. To facilitate and 
accelerate failure region derivation, five semiautomated software tools have been developed 
to support the failure region analysis: REACHER, WALKER, FALTER, SPACER and VIEWER. 
A flow diagram of the use of these tools is in figure 9. 

These tools are not intended to replace a human analyst, but rather to support the 
analyst in the derivation of failure regions. At any point in the analysis process, the analyst 
may intervene to replace conditions, change the source code fragment being analyzed, or 
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Figure 9: Flow Diagram of Support Tools 



examine previous analysis results. At some points in the process, the analyst is required 
to provide information that is not automatically determinable. 

Two tools support the reachability analysis process. REACHER parses Pascal programs 
and builds a control flow graph. This control-flow graph is a data-structure representation 
of the program model given in section 3.2. Using the control-flow graph, the conditions 
under which each statement in the program may be reached may be derived. 

REACHER is constructed using LEX and YACC and a specification for Pascal based on 
the ISO standard. In a single non-interactive pass over the Pascal source code, REACHER 
creates a list of blocks corresponding to each procedure and function. Each block contains 
information such as the declaration text for each block, the sub-blocks belonging to or 
nested within each block and a pointer tracing the block back through the thread of its 
definition. 

Concurrent with the construction of the linked list of blocks is the construction of a 
linked list of nodes that correspond to the statements in the program. It is this list that 
results in the control-flow graph. For example, consider the Pascal fragment in Figure 10. 

begin 

readln (x) ; 

if x > 3 then 

y := 10 
else 

y := 20; 
writeln(x, y) 

end 



Figure 10: Example Pascal Fragment 



The list would contain nodes for begin-end, the readln procedure call, the if- 
then-else, the two assignment statements and the writeln procedure call. The nodes 
are linked as shown in figure 11, forming the control-flow graph. Each node is annotated 
with the conditions under which program flow may proceed to each succeeding statement. 

Once REACHER has constructed the initial control-flow graph, annotated with the im- 
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Figure 11: Node Linkages in Example Pascal Fragment 



mediate branching conditions, WALKER is used to collapse unneeded nodes and to collect 
the immediate branching conditions into the node reachability conditions. Unneeded nodes 
axe those unrelated to the fault being analyzed. They are collapsed by merger of the node 
information in the internal data structure. The collection of branching conditions into 
reachability conditions is accomplished by a user-guided loop [0,1] traversal as described in 
section 3.4. By default WALKER will ‘and’ together the branching conditions of the nodes 
traversed. If the user finds the resultant conditions undesirable, WALKER allows the user to 
replace the current condition with one of the user’s choosing at any point in the traversal. 
WALKER is implemented in C using the BSD Unix support libraries. 

FALTER is used to attach a description of a known fault to the data structure cor- 
responding to the program block in which the fault is found. The fault description cor- 
responds to the information described in the fault model in section 3.2, including the 
error-generation conditions and the list of variables with values immediately contaminated 
by the fault. FALTER then translates the internal structure into a LISP expression for sym- 
bolic execution by SPACER. FALTER is implemented in C using the BSD Unix support 
libraries. 

SPACER supports the propagation analysis described in section 3.5. At the start of 
execution, SPACER prompts the analyst to select the fault to be analyzed from the list 
of faults identified through FALTER in the current program. SPACER then symbolically 
executes the program, performing a traversal of the program paths. When the fault lo- 
cation is reached, SPACER initiates the analysis process described in section 3.5, using 
the information provided via FALTER. At each output statement, SPACER reports the 
current propagation conditions for the variables referenced in the output. During this pro- 
cess, SPACER requires user guidance to select paths to traverse and to resolve complex 
expressions. SPACER periodically reports to the user the current state of the analysis and 
the statements being traversed. SPACER is implemented in Allegro Common LISP, using 
portions of the UNISEX symbolic execution package[7]. 
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While each of the above tools may be directly accessed by the analyst, a coordinating 
visual interface is provided by the VIEWER tool[2]. VIEWER presents textual and graphic 
information to facilitate the analyst’s tracking of the analysis process, provides a mouse- 
driven command structure to replace the text command structure supported by the other 
tools, deals with tool initiation and termination, eases consistency between tool usage, and 
provides an extensible command structure to ease common commands. VIEWER is imple- 
mented in C using the SunView user interface libraries. A sample of VIEWER operation 
(one supporting WALKER execution) is given in figure 12. 

5 Conclusions 

This paper has presented a new technique for deriving the bounding conditions for soft- 
ware failure regions. These conditions form the necessary and sufficient conditions for an 
individual program fault to be executed and to lead to a failure. But the technique requires 
that the fault be known prior to the analysis. One testing task in which this knowledge 
would be present is the regression testing task. 

There are two immediately obvious ways in which failure regions analysis aids regression 
testing. First, the failure region offers a framework within which statistical sampling 
techniques can be used to evaluate fault correction. Testers, if they know the limits on 
the useful test cases, can intelligently plan regression tests, rather than iterate previous 
testing. Research is in progress on detailed regression-testing techniques based on sampling 
of failure regions. 

The second way failure regions analysis aids regression testing is by focusing detailed 
attention on the fault. As a review method, failure regions analysis may allow the analyst 
to identify if a proposed correction is inadequate prior to run-time testing. As a planning 
method, failure regions analysis may allow maintenance personnel further insight on the 
fault as they consider alternative corrections to the fault. 
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Walker Window 2.4 




SC := Slope(X0,Y0,CX,CY); 

SC := arctan(SC); 

SB := Slope(X0,Y0,BX,BY); 

SB := arctan(SB); 
if ( (YO > BY) and (XO > BX)) then 
Angle := abs(SD - SA) 
else 

if ((XO < CX) and (YO < CY)) then 
Angle := abs(SA - SD) 

else 

if ((YO > AY) and (XO < AX)) then 
Angle := abs(SC - SB) 

8 1 S8 

if ((YO > DY) and (XO > DX)) then 



c: ' true' 

a: 'here we are and (null)' 

>g Angle 
node: 4 
( 13) ' ' 
c: ' true' 
a: '(null)' 

>n 34 
node : 34 

(35)' if ((XO < CX) and (YO < CY)) then' 

' ((XO < CX) and (YO < CY)) ' 
a:'(null)' 

>0 



frirmottte 'l [Condi ti on] [ Join j [ Goto ] [ Prior ] [ Left ) [ Right J [ Node ] 

Walker input string: 

( Type ) [ Lister ] Lister input: Lister output: 

LCDR Vicki Pbel, USN and CRPT Hedio Monti, USMC 



Figure 12: Sample of VI EWER Operation 




Finally, knowledge of the failure region bounds allows examination of a variety of issues 
relating to the applicability of software testing techniques. Research is in progress that 
applies software failure region analysis to a population of software with known faults and 
examines these issues. In summary, software failure regions analysis offers both assistance 
in portions of the software test effort and insight into a variety of phenomena relating to 
faults and their detection. 
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